Search CORE

73 research outputs found

Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

Author: Beskow Jonas
Salvi Giampiero
Stefanov Kalin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

NORA - Norwegian Open Research Archives

Interactive Robot Learning of Gestures, Language and Affordances

Author: Bernardino Alexandre
Jamone Lorenzo
Salvi Giampiero
Saponaro Giovanni
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2017
Field of study

A growing field in robotics and Artificial Intelligence (AI) research is human-robot collaboration, whose target is to enable effective teamwork between humans and robots. However, in many situations human teams are still superior to human-robot teams, primarily because human teams can easily agree on a common goal with language, and the individual members observe each other effectively, leveraging their shared motor repertoire and sensorimotor resources. This paper shows that for cognitive robots it is possible, and indeed fruitful, to combine knowledge acquired from interacting with elements of the environment (affordance exploration) with the probabilistic observation of another agent's actions. We propose a model that unites (i) learning robot affordances and word descriptions with (ii) statistical recognition of human gestures with vision sensors. We discuss theoretical motivations, possible implementations, and we show initial results which highlight that, after having acquired knowledge of its surrounding environment, a humanoid robot can generalize this knowledge to the case when it observes another agent (human partner) performing the same motor actions previously executed during training.Comment: code available at https://github.com/gsaponaro/glu-gesture

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Cluster Analysis of Differential Spectral Envelopes on Emotional Speech

Author: Cosi Piero
Salvi Giampiero
Tesser Fabio
Zovato Enrico
Publication venue: ISCA-INST SPEECH COMMUNICATION ASSOCIATION
Publication date
Field of study

This paper reports on the analysis of the spectral variation of emotional speech. Spectral envelopes of time aligned speech frames are compared between emotionally neutral and active utterances. Statistics are computed over the resulting differential spectral envelopes for each phoneme. Finally, these statistics are classified using agglomerative hierarchical clustering and a measure of dissimilarity between statistical distributions and the resulting clusters are analysed. The results show that there are systematic changes in spectral envelopes when going from neutral to sad or happy speech, and those changes depend on the valence of the emotional content (negative, positive) as well as on the phonetic properties of the sounds such as voicing and place of articulation

PUblication MAnagement

S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction

Author: Adiban Mohammad
Salvi Giampiero
Siniscalchi Sabato Marco
Stefanov Kalin
Publication venue
Publication date: 13/07/2023
Field of study

We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the ST-PixelCNN's ability at handling spatiotemporal information, S-HR-VQVAE can better deal with chief challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on the KTH Human Action and Moving-MNIST tasks demonstrate that our model compares favorably against top video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and ST-PixelCNN parameters.Comment: 14 pages, 7 figures, 3 tables. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence on 2023-07-1

arXiv.org e-Print Archive

Analisi gerarchica degli inviluppi spettrali differenziali di una voce emotiva

Author: Cosi Piero
Salvi Giampiero
Tesser Fabio
Zovato Enrico
Publication venue: Bulzoni
Publication date
Field of study

.In questo articolo viene descritto un nuovo metodo di analisi del timbro vocale tramite lo studio delle variazioni di inviluppo spettrale utilizzato da uno stesso parlatore in situazioni emotiva neutra o espressiva. Il contesto dell\u27analisi riguarda un corpus di un solo parlatore istruito a leggere una serie di frasi utilizzando uno stile di lettura neutro e successivamente utilizzando due modalit? emotive: uno stile allegro e uno stile triste. Gli inviluppi spettrali relativi alle versioni allineate delle realizzazioni vocali neutre e espressive (allegra e triste) sono confrontati utilizzando un metodo differenziale. Le differenze sono state calcolate tra lo stato emotivo e quello neutro, di conseguenza le due categorie messe a confronto sono neutro-allegro e neutro-triste. La statistica degli inviluppi differenziali ? stata calcolata per ogni fono. I dati sono stati esaminati utilizzando un metodo di clustering gerarchico di tipo agglomerativo. I cluster risultanti sono avvalorati con diverse misure di distanza tra le distribuzioni statistiche ed esplorati visivamente per trovare similitudini e differenze tra le due categorie. I risultati mettono in evidenza sistematiche variazioni nel timbro vocale relative ai due insiemi di differenze di inviluppi spettrali. Questi tratti dipendono dalla valenza dell\u27emozione presa in considerazione (positiva, negativa) come dalle propriet? fonetiche del particolare fono come ad esempio sonorit? e luogo di articolazione

PUblication MAnagement

User Evaluation of the SYNFACE Talking Head Telephone

Author: Eva Agelfors
Giampiero Salvi
Inger Karlsson
Jonas Beskow
Neil Thomas
Publication venue
Publication date: 01/01/2006
Field of study

Abstract. The talking-head telephone, Synface, is a lip-reading support for people with hearing-impairment. It has been tested by 49 users with varying degrees of hearing-impaired in UK and Sweden in lab and home environments. Synface was found to give support to the users, especially in perceiving numbers and addresses and an enjoyable way to communicate. A majority deemed Synface to be a useful product.

CiteSeerX

Crossref

Bidirectional fluxes of spermine across the mitochondrial membrane.

The polyamine spermine is transported into the mitochondrial matrix by an electrophoretic mechanism having as driving force the negative electrical membrane potential (DW). The presence of phosphate increases spermine uptake by reducingDpH and enhancingDW. The transport system is a specific uniporter constituted by a protein channel exhibiting two asymmetric energy barriers with the spermine binding site located in the energy well between the two barriers. Although spermine transport is electrophoretic in origin, its accumulation does not follow the Nernst equation for the presence of an efflux pathway. Spermine efflux may be induced by different agents, such as FCCP, antimycin A and mersalyl, able to completely or partially reduce theDWvalue and, consequently, suppress or weaken the force necessary to maintain spermine in the matrix. However this efflux may also take place in normal conditions when the electrophoretic accumulation of the polycationic polyamine induces a sufficient drop inDWable to trigger the efflux pathway. The release of the polyamine is most probably electroneutral in origin and can take place in exchange with protons or in symport with phosphate anion. The activity of both the uptake and efflux pathways induces a continuous cycling of spermine across the mitochondrial membrane, the rate of which may be prominent in imposing the concentrations of spermine in the inner and outer compartment. Thus, this event has a significant role on mitochondrial permeability transition modulation and consequently on the triggering of intrinsic apoptosis

Archivio Ricerca Ca'Foscari

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- Università di Roma La Sapienza

Archivio istituzionale della ricerca - Università di Padova